Downloaded from http://biorxiv.org/on September 18, 2014 



bioRviv 

f V beta 

THE PREPRINT SERVER FOR BIOLOGY 

Phylogenetic confidence intervals for the optimal trait value 

Krzysztof Bartoszek and Serik Sagitov 
bioRxiv first posted online May 1 9, 201 4 

Access the most recent version atdoi: http://dx.doi.org/10.1101/005314 



Copyright The copyright holder for this preprint is the author/funder. All rights reserved. No 
reuse allowed without permission. 



Downloaded from http://biorxiv.org/on September 18, 2014 



Phylogenetic confidence intervals for the 
optimal trait value 

Krzysztof Bartoszek and Serik Sagitov 
May 19, 2014 

Abstract 

We consider a stochastic evolutionary model for a phenotype developing 
amongst n related species with unknown phylogeny. The unknown tree is 
modelled by a Yule process conditioned on n contemporary nodes. The 
trait value is assumed to evolve along lineages as an Ornstein-Uhlenbeck 
process. As a result, the trait values of the n species form a sample with 
dependent observations. We establish three limit theorems for the sample 
mean corresponding to three domains for the adaptation rate. In the case 
of fast adaptation, we show that for large n the normalized sample mean is 
approximately normally distributed. Using these limit theorems, we develop 
novel confidence interval formulae for the optimal trait value. 

Keywords : Central limit theorem, Conditioned Yule process, macroevolution, 
martingales, Ornstein-Uhlenbeck process, phylogenetics 

1 Introduction 

Phylogenetic comparative methods deal with multi-species trait value data. This 
is an established and rapidly expanding area of research concerning evolution of 
phenotypes in groups of related species living under various environmental con- 
ditions. An important feature of such data is the branching structure of evolution 
causing dependence among the observed trait values. For this reason the usual 
starting point for phylogenetic comparative studies is an inferred phylogeny de- 
scribing the evolutionary relationships. The likelihood can be computed by as- 
suming a model for trait evolution along the branches of this fixed tree, such as 
the Ornstein-Uhlenbeck process. 
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The one-dimensional Ornstein-Uhlenbeck model is characterized by four pa- 
rameters: the optimal value 0, the adaptation rate a > 0, the ancestral value Xq, 
and the noise size o. The classical Brownian motion model [11] can be viewed 
as a special case with a = 0 and 6 being irrelevant. As with any statistical pro- 
cedure, it is important to be able to compute confidence intervals for these pa- 
rameters. However, confidence intervals are often not mentioned in phylogenetic 
comparative studies [5]. 

There are a number of possible numerical ways of calculating such confidence 
intervals when the underlying phylogenetic tree is known. Using a regression 
framework one can apply standard regression theory methods to compute confi- 
dence intervals for (0,Xq) conditionally on (a, a 2 ) [12, 16, 21, 24]. Notably in 
[13] the authors derive analytical formulae for confidence intervals for Xq under 
the Brownian motion model. In more complicated situations a parametric boot- 
strap is a (computationally very demanding) way out [5, 8, 20]. Another approach 
is to report a support surface [16, 17], or consider the curvature of the likelihood 
surface [4]. 

All of the above methods have in common that they assume that the phylogeny 
describing the evolutionary relationships is fully resolved. Possible errors in the 
topology can cause problems - the closer to the tips they occur, the more prob- 
lematic they can be [34]. On the other hand, the regression estimators will remain 
unbiased even with a misspecified tree [25] and also seem to be robust with respect 
to errors in the phylogeny at least for the Brownian motion model [33]. There are 
only few papers addressing the issue of phylogenetic uncertainty. An MCMC pro- 
cedure to jointly estimate the phylogeny and parameters of the Brownian model of 
trait evolution was suggested in [19, 18]. Recently, [27] develops an Approximate 
Bayesian Computation framework to estimate Brownian motion parameters in the 
case of an incomplete tree. 

Our paper studies a situation when nothing is known about the phylogeny. The 
simplest stochastic model addressing this case is a combination of a Yule tree and 
the Brownian motion on top of it: already in the 1970s, a joint maximum likeli- 
hood estimation procedure of a Yule tree and Brownian motion on top of it was 
proposed in [10]. This basic evolutionary model allows for far reaching analytical 
analysis [9, 26]. A more realistic stochastic model of this kind combines the Brow- 
nian motion with a birth-death tree allowing for extinction of species [7]. For the 
latter model [26] explicitly compute the so-called interspecies correlation coeffi- 
cient. Such "tree-free" models are appropriate for working with fossil data when 
there may be available rich fossilized phenotypic information but the molecular 
material might have degraded so much that it is impossible to infer evolutionary 
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relationships. In [9] the usefulness of the tree-free approach for contemporary 
species is demonstrated in an Carnivora order case study. 

Conditioned birth-death processes as stochastic models for species trees, have 
received significant attention in the last decade [3, 14, 22, 29, 30, 31]. In this work 
the unknown tree is modeled by the Yule process conditioned on n extant species 
while the evolution of a trait along a lineage is viewed as the Ornstein-Uhlenbeck 
process, see Fig. 1. We study the properties of the sample mean and sample 
variance computed from the vector of n trait values. Our main results are three 
asymptotic confidence interval formulae for the optimal trait value 9. These three 
formulae represent three asymptotic regimes for different values of the adaptation 
rate a. 

In the discussion in [9] it is pointed out that "as evolutionary biologists further 
refine our knowledge of the tree of life, the number of clades whose phylogeny is 
truly unknown may diminish, along with interest in tree-free estimation methods." 
In our opinion, the easy-to-compute tree-free predictions will always play an im- 
portant role of a sanity check to see whether the conclusions based on the inferred 
phylogeny deviate much from those from a "typical" phylogeny. Moreover, re- 
sults like those presented here can also be used as a method of testing software for 
phylogenetic comparative models. 

A detailed description of the evolutionary model along with our main results 
are presented in Section 2. Section 3 contains new formulae for the Laplace trans- 
forms of important characteristics of the conditioned Yule species tree: the time 
to origin U n and the time to the most recent common ancestor for a pair of 
two species chosen at random out of n extant species. In Section 4 we calculate 
the interspecies correlation coefficient for the Yule-Ornstein-Uhlenbeck model 
and Section 5 contains the proof of our limit theorems. In Section 6 we estab- 
lish the consistency of the stationary variance estimator, which is needed for our 
confidence interval formulae, cf [16] where the residual sum of squares was sug- 
gested to estimate the stationary variance. In Appendix A we calculate all the joint 
moments of U n and x^ 1 '. 

Our main result, Theorem 2.1, should be compared with the limit theorems 
obtained in [ 1 , 2] . They also revealed three asymptotic regimes in a related, though 
different setting, dealing with a branching Ornstein-Uhlenbeck process. In their 
case the time of observation is deterministic and the number of the tree tips is 
random, while in our case the observation time is random and the number of the 
tips is deterministic. Although it is possible (with some effort) to deduce our 
results from [1, 2], our proof provides a much more elementary derivation. We 
believe that our approach will be useful in addressing other biologically relevant 



3 



Downloaded from http://biorxiv.org/on September 18, 2014 




1 2 3 4 5 




Figure 1: On the left: a branching Ornstein-Uhlenbeck process simulated on a 
realization of the Yule rc-tree with n = 5 tips using the TreeSim [29, 30] and mvS- 
LOUCH [4] R [23] packages. Parameters used are a = 1, <J = 1,Xq — 6 = 2, after 
the tree height U n was scaled to 1 . On the right: the species tree disregarding the 
trait values supplied with the notation for the inter-speciation times. For the pair 
of tips (2,3) the time x^ 1 ' to their most recent common ancestor is marked on the 
time axis (starting at present and going back to the time of origin). 

issues like the formulae for the higher moments given in Appendix A. 

2 The model and main results 

This work deals with what we call the Yule-Ornstein-Uhlenbeck model which is 
characterized by four parameters (Xq, a, a, 6) and consists of two ingredients 

1. the species tree connecting n extant species is modeled by the pure birth 
Yule process [35] with a unit speciation rate A = 1 and conditioned on hav- 
ing n tips [14], 

in) (n) 

2. the observed trait values (X| ',... y X„ ) on the tips of the tree evolved from 
the ancestral state Xo following the Ornstein-Uhlenbeck process with pa- 
rameters (a, o, 6). 

Definition 2.1 Let (T\ , . . . , T n ) be independent exponential random variables with 
parameters (1, . . . , n). We define the Yule n-tree as a random tree with n tips which 
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is constructed using a bottom-up algorithm based on the following two simple 
rules. 

(1) During the time period 7^ the tree has k branches. 

(2) For k G [2, ft] the reduction from ktok—1 branches occurs as two randomly 
chosen branches coalesce into one branch. 

The height the Yule n-tree is now U n = Ti + ... + T n . 

As shown in [14], this definition corresponds to the standard Yule tree conditioned 
on having n tips at the moment of observation, assuming that the time to the origin 
has the improper uniform prior. 

Following [8, 16], we model trait evolution along a lineage using the Ornstein- 
Uhlenbeck process X(t) given by the stochastic differential equation 

dX(t) = -a(X(t)-6)dt + adB(t), X(0)=X 0 . (1) 

Here a > 0 is the adaptation rate, 6 is the optimal trait value, a 2 is the noise 
variance, and B(t) is the standard Wiener process. The distribution of X(t) is 
normal with, 

E[X(t)} = e+e- at (X 0 -e), Var[X(0] = ^(l- e - 2m ), (2) 

implying that X(t) looses the effect of the ancestral state Xq at an exponential 
rate. In the long run the Ornstein-Uhlenbeck process acquires a stationary normal 
distribution with mean 0 and variance o 2 /2a. 

We propose asymptotic confidence interval formulae for the optimal value 0 
which take into account phylogenetic uncertainty. To this end we study properties 
of the sample mean and sample variance 

Using the properties of the Yule-Ornstein-Uhlenbeck model we find explicit ex- 
pressions for E [X n ] , Var pf n ] , E [S^l , study the asymptotics of Var [S 2 ] , and 
prove the following limit theorem revealing three different asymptotic regimes. 

Theorem 2.1 Let 8 = f°7 p be a normalized difference between the ancestral 

y/a 2 /2a 

and optimal values. Consider the normalized sample mean Y n = , = of the 
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Yule-Ornstein-Uhlenbeck process with Yq = 5. As n — )■ °° the process Y n has the 
following limit behavior. 

(i) If (X > 0.5, then \fn-Y n is asymptotically normally distributed with zero 
mean and variance 2 a-i • 



(ii) If a = 0.5, then y n/\nn ■ Y n is asymptotically normally distributed with 
zero mean and variance 2. 

(Hi) Ifoc< 0.5, then n a ■ Y n converges a.s. and in I? to a random variable Y a § 

withE[Y a>s ] =Sr(l + a) aw/EfeJ = (8 2 + ^) T(l + 2a). 

Let z x be the x-quantile of the standard normal distribution, and q x be the x- 
quantile of the limit Y a §. Denote by S n the sample standard deviation defined 
as the square root of S 2 . As it will be shown in Section 6, the sample variance 
S 2 is a consistent estimator of j^. This fact together with Theorem 2.1 allows us 
to state the following three approximate (1 — x) -level confidence intervals for 0 
assuming that we know the value of a: 



- K a 2a + 1 

for a > 0.5 X„±z 1 _ x/2 -5„- -^=, Ka = \j 2a -V 

\/2~\Kn 

for a = 0.5 X n ±z\- x /r-S n 

for a < 0.5 (X n -q x / 2 -S n -n^ a 1 X n + q\- x / 2 • S n -n~ a ). 

Notably, the first of these confidence intervals differs from the classical confi- 
dence interval for the mean (X n ±Z\- x /2S n / *Jn) just by a factor K a . The latter 
is larger than 1, as it should, in view of a positive correlation among the sample 
observations. The correction factor K a becomes negligible in the case of a very 
strong adaptation, «> 1, when the dependence due to common ancestry can be 
neglected. 

Remark 2.1 Observe that our standing assumption A = 1, see Definition 2.1, 
of having one speciation event per unit of time causes no loss of generality. To 
incorporate an arbitrary speciation rate X one has to replace in our formulae 
parameters a and O 2 by a/X and G 2 /X. This transformation corresponds to the 
time scaling by factor X in Eq. (1), it changes neither the optimal value 9 nor the 
stationary variance o 2 / (2a). 
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3 Sampling m leaves from the Yule n-tree 

Here we consider the Yule n-tree, see Definition 2.1 and study some properties 
of its subtree joining m randomly (without replacement) chosen tips, where m G 
[2, «]. In particular, we compute the joint Laplace transform of the height of the 
Yule n-tree U n = T\ + . . . + T„ and z^ n \ the height of the most recent common 
ancestor for two randomly sampled tips, see Fig. 1. For other results concerning 
the distribution of tW and U„ see also [14, 15, 22, 26, 28, 29, 31, 32]. 

Lemma 3.1 Consider a random m-subtree of the conditioned Yule n-tree. It has 
m — \ bifurcating events. Let K^ ,m ^ < . . . < K^f^ be the consecutive numbers of 
the bifurcation events in the Yule n-tree (counted from the root toward the leaves) 
corresponding to the m—l bifurcating events of the m-subtree. Put Kq ' =0 

and Ktn' m ^ = n. The sequence (Km , ■ ■ .,Kq ) forms a time inhomogeneous 
Markov chain with transition probabilities 

P(^; n) = k\Kf> m) = i) =p\{>, l<j<k<i<n, 
where pf^ = 1 for all i > 1, and 

^^n ^-r . >=* - 

Proof Tracing the lineages of m randomly sampled tips of the Yule n-tree towards 
the root, the first coalescent event can be viewed as the success in a sequence of 
independent Bernoulli trials. This argument leads to the expression cf [29] 



T3fv-(n,m) _ ,i r (n,m) _ s_ ( (l)] ( , (2) | (2) 



m(m-l) »p} (i+l-m)(i + m) 

II a ' k = m-l,...,n-l, 



nk i z 

i=k+l 



confirming the formula stated for the transition probabilities . The transition 
probabilities p^ 
that^_ 1 = l. 



probabilities p^P for j = 2, . . . , m — 1 are obtained similarly. Notice, as a check, 



□ 
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Lemma 3.2 Consider the inter-bifurcation times for the m-subtree of the Yule n- 
tree 



(n,m) rp . | rrt 

Xj — 1 K (n,m) + l + ■ ■ ■ + 1 K (n,m), J — 

so that U n = x[ n ' m ^ + ■ ■ ■ + Xm for any m < n, and = %2 ■ Tnen f or 
Xj > — 1 we have 



(n,m) 



expj 



where k m = n, Icq = 0, and 
1 2 



n— 1 n— 1 n— 1 m U, 

EE- e n^v,^ 

&1 = 1 k 2 =ki + 1 k m - 1 =^ m _2 + 1 j= 1 



/— 1 j— 1 



n r(n + i)r(x+i) 



£ = 

l+jc 2 + x "' n + x T(n+x+l) 



, x>-l. 



Proof The Laplace transform of the sum of independent exponentials: 



exp{-^^' m) - • • ■ -Xmxt m) Wt"l\. ..4 n ' m) ) = (km-U- -K 



r[ Xj ._ 1+ ^._ 1+ i Xj -i+, 



together with Lemma 3.1 implies the stated equality 



exp{-£x^j n ' m) } 



n— 1 n— 1 
k 



n— 1 m 



xj-i+kj 



□ 
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Lemma 3.3 The joint Laplace transform of the height of the Yule n-tree and the 
height of the most recent common ancestor for two randomly sampled tips is given 
by 



2{n+l)b„s +y j} 



(n-1) fe fi (k + 2)(k+l)b kiX+y 



In particular, 



E[e- xU »]=b n s, 
Vn[e- xU "] =b, h2x -bl x , 
and, denoting the harmonic number h n := 1 + 1/2 + . .. + !/«, 



(3) 
(4) 



e J 



2-(n+l)(y+l)4w, 



(»-l)(y-l) ' 



(5) 



Proof Turning to Lemma 3.1 with m = 2 we get 



(2 , = 2ti (f-l)(f + 2) 
and according to Lemma 3.2 



2(n+l) 



(/z-l)(fc + 2)(fc+l)' 



£ = 1, . . . ,71— 1, 



-x(t/„-TW)-yTW 



n-1 



Erf 



71 



Ar=l 



( 2 ) 1 k+1 

k x+l x + ky + k+l y + n 



2^+1)^^ 



bkx 



(n-\) ^(k + 2)(k+l)b kj 



(6) 



This implies the main formula claimed by Lemma 3.3 giving E \e xU "~\ = b, ljX after 
putting y = 0. With x = 0, 



-vtv 



2(71+1)^^ 



(n-l) ^(k + 2)(k+l)b k , y 
2(n + l)! 



tjrfifc+l+y) 



(n-ljr^ + n+l)^ r(fc + 3) 
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When y = I this directly becomes 

E 



2 1 

' (*»-!)- 



n-1 



n+1 



In the case of y ^ 1 we use the following relation (easily verified by induction 
when z 7^ y) 



n-l 



T(k + y) T(n + z)T(y+\)-T(z+l)T(n+y) 



r(z + l)T(n + z)(z-y) 



(7) 



to derive 



2r(n + 1 + y) - T(n + 2)T(y + 2) 2 - (n + 1) (y + 1 



(n-l)r(y + n+l)(y-l) 



(n-l)Cy-l) 



□ 



Lemma 3.4 Ain-><» /or positive x and y we have the following asymptotic re- 
sults 



where 



E[e- xUn ] ~T{x+\)n-\ 

J±*r(y+l)./r> i/0<y<l, 
~> 1 ~ ^ 2n~ 1 lnn, ify = l, 



"7 "J J x 5 



2T(x+l) 



,y» , ifQ<y<\, 
2r(x+l)n- x - 1 lnn, ify=\, 

ify>h 



C XJ = 2T(x + y+l)Y, 



\{k + 2)(k+\)b kiX 



+y 



Proof The stated results are obtained from Lemma 3.3 using the first of the 
following three asymptotic properties of the function b n , x 
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b n>x ~r(x+l)n x , 
l-(n +!)&„,; 



x- 1 



-^■h n — 1, x-)- 1, 



* (1 -&n,x) ->-^n, 

These three relations will often be used tacitly in what follows. 



□ 



4 Interspecies correlation 

Denote by W n the a-algebra containing all information on the Yule rc-tree. The 

(n) 

scaled trait values Yf ' := J,^^ , in view of Eq. (2), are conditionally normal 
with 



E 

Var 



Y\ n) \W n 



8e- aU ", 
1 _ e -2au % 

which together with the results from Section 3 entails 



E 
Var 



,(») 



r{n) 



l-8 2 bia + (S 2 -l)bn,2a. 



Lemma 4.1 In the framework of the Yule-Ornstein-Uhlenbeck model, for an ar- 
bitrary pair of traits we have 



Cov 



-2aS 



•j - e 



-2aU„ 



(n) 

where t-j is the backward time to the most recent common ancestor of the tips 

(U). 
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(n) 

PROOF Denote by FA the normalized trait value of the most recent common an- 
cestor of the tips (i, j). Let stand for the a-algebra generated by Y^. f ), 
then using Eq. (2) we get 



Cov 



y(") yWwW 



(n) 



(n) 



I j\ J/ ij 



implying the statement of this lemma 



Cov 



Var 



(") C \ 

-°4 J y.W|^ 



e - 2a ^f - e - 2aU \ 



□ 



Lemma 4.2 Consider the interspecies correlation coefficient, the unconditioned 
correlation between two randomly sampled trait values 



Pn = ( 1 n LE 

Ifa^ 0.5, then 

2a(n-l) + (n + l)((l+2a)b nj2a -r 



Cov 




1 


Var 


x (n) 


n{n 



Cov 


y(«) yW 
' ' •> 


Var 


y(«) 





Pn = 1 - 



(«-i)(2o-i)(i + (fi2-i)^-^ B ; 

and in ?/ze case o/oc = 0.5 

n + 1 n + 2 — 2/z„ 

Pn= ~n-\n + 8Hl-(n+l)bl 05 Y 

PROOF According to Lemma 4.1 we have, 
2 



n(n — 1) 



i)£ Cov 



yW yO) 



e -2axW _ e -2aU„ 



+ 8 2 Var[e- aU "} 
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leading to 



Pn = 1 



1-E 


' e -2aW 




1 -E[e~ 2aU " 


+ 8 2 Var[e- aU »] 



Applying the results of Section 3 we arrive at the asserted relations for p„. Observe 
that asymptotically as n — > °° the interspecies correlation coefficient decays to 0 
as 



Pn 



i^r(l + 2a) + 8 2 (r(l + 2a) - T 2 (a + 1)) rr 2a 0 < a < 0.5, 
2n~ l \nn a = 0.5, 

a > 0.5. 



2 n- 1 



2a- 1 



□ 



Lemma 4.3 Consider the sample mean Y n = n - l (Y^ n) + ... + Yi n) ) and the sam- 
ple variance 



1=1 



of the scaled trait values. For all a > 0 we have E [y„] = 8b na . For a ^ 0.5 



r , l+2a-(4an + l+2a)b n , 2a , ,2 , 

Var [ y «] = 7^ — + $ (bn,2a-bi a ), 



B[E%]=1 + 



(2a-\)n 
'l + 2a)(n+l)b n>2 a-2 



(2a-l)(n-l) 
and in the singular case a = 0.5 



Var[7 M ] 
Proof Obviously, E [7 n ] = E 



2(^-1) . 8 2 -l 2 2 



n-1 



5^,3, a ■ To prove the other assertions we 
turn to [26], where the concept of interspecies correlation was originally intro- 
duced. It was shown there that the variance of the sample average and the expec- 
tation of the sample variance can be compactly expressed as 
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Var[X„] 



E[sl]=(l-p n )Var 



1 n-1 

- H 

n n 



Pn Var 



X 



(») 



(»)' 



Since Var 



2a 



Var 



X 



(«)' 



, Var [y„] = g Var [Z„] , and E [Z^] = ^ E 



it remains to combine Lemma 4.2 with the known expression for Var 

A more direct proof of Lemma 4.3 can be obtained using the following result 
on conditional expectations. 

□ 



Lemma 4.4 We have 



E[Y n \%}=8e- aU ", 



E 

Var 



Y 2 W 

1 nl^n 



n _1 + (l-n _1 )E 



n _1 + (l-n _1 )E 



e- 2a ' {n) \% 
e- 2a " [n) W n 



— e 



e -2aU n + 5 2 e -2aU % 
2aU n 



PROOF The main assertion follows from 



Var 



Y™ + ... + Yr\&k =n(l 



- 2aT o n) - e - 2aU >^ 



H-«V 2 *+«(n-l)E 



□ 



5 Proof of Theorem 2.1 



r{x) ._ 



b n l ■ e xU " with E 



1. For <my x > — 1 the se- 



Lemma 5.1 Put V„ 

quence {Vn ,^n}n>0 farms a martingale converging a.s. and in L 2 . Moreover, 
(U n — logn) converges in distribution to a random variable having the standard 
Gumbel distribution. 
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PROOF The martingale property is obvious 



V. 



to 



Kkx ■ e- XU " E [^- Vr " +1 ] = Kl ■ e~ xUn = V, 



to 



Since the second moments 



are uniformly bounded over n, we may conclude that Vn V^ a.s. and in I? 



b n ,2x 

(M 2 

to 



with E 



yto 



1 . It follows that E 



Vn 



—7- 1 , and therefore, E 



e -x(U n -logn) 



T(x+ 1). The latter is a convergence of Laplace transforms confirming the stated 
convergence in distribution. 

□ 

Observe that the Gumbel limit for U n — logrc can be obtained using the classical 
extreme value theory, in view of the representation 

u n = Y, rlR i = max(£i,...,£„) 
(=1 

in terms of independent exponentials with parameter 1. Notice also that U n +i/2 
has the same distribution as the total branch length of Kingman's n-coalescent. 

Lemma 5.2 Denote by & n the o-algebra containing information on the Yule n- 
tree realization as well as the corresponding information on the evolution of trait 
values. Set 

H n : = {n+\)e^- x)Un Y m n>0. 
The sequence {H n , ^ n }n>o forms a martingale with E [H n ] = Hq = 5. 

Proof Notice that, 



,(a-i)r„+i "y y.( n+1 )|jr n 



W i ,-ir v(") 



7=1 



"+ 1 "+ 1 ^w_ ("+ 1 ) 2 Fr 



n + 2 



n + 2 
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Hence 



E[H n+1 \& n ] 



n + 2 ( a _ r 

7 e 

n+1 



u„- 



g (a-l)r„ +1 y^(»+i)|j- n 
(=l 



□ 



Lemma 5.3 For all positive a we have Var 



e- 2axin) W n 



0{n ^) as n 



(n) (n) 

PROOF For a given realization of the Yule n-tree we denote by tJ ; and two 
independent versions of corresponding to two independent choices of pairs of 
tips out of n available. We have, 



-20£TW 



Writing 



(2) 



f(a,k,n) 

and using the ideas of Section 3 we obtain 



k+1 



a + k+\ a + n 



On the other hand, 



n-l 

L/4a(M)^ 

n— 1 n— 1 

+ 2 ^ H f2a{hM)fAa{k2,n)7ln,ki 7l n,h- 
k\=\ k 2 =k\ + \ 



k\ k 2 



Taking the difference between the last two expressions we find 
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Var 



£ {Ua{Kn)-f2a{Kn) 2 )nl k 



n—1 n—1 



+ 2 £ £ f2a{hM)[Ua{h,n)- f 2 a{h,n) 2 }Jl, hkl K n k 2 . 
ki=\k 2 =h+\ 

Using the simple equality 

n 

a\---a n -b\---b n = £ b\ ■ ■ ■ bi-\ (a; - bi)a i+ \ • ■ ■ a n 



!=1 



we see that it suffices to prove that, 



.-4\ 



n-l 
k=l 

n—1 n—1 

£ £ f2a(kiM)A n ,k 2 Xn,k x K n,k 2 = 0(n 
k\ = l k 2 =ki + l 



-3\ 



where 



A n ,k-= £ /2a(^j) 2 ( 2a ^ +1 ) MM- 



j=k+l 



To verify these two asymptotic relations observe that 



K,k< 



k+l n " 4a 2 Aa 2 b n - 



< 



'Act 



y — 

Aa + k+l 4a+n (2a + i) 2 b kM i Jj^ 1 kb k 



1 , 4a 2 b n A a 



< 



Act 



Since 7tn >k = {n _$$ {k+l) , it follows 



n-l 

Y, A "-k n lk < cibnAa £ -7X7- < c 2 n- Aa £ n 4a - 5 < c 2 n 

k=l k=l K ° k A a k=l 



n-l 



and 



n— 1 n— 1 n— 1 n— 1 ^ j 

£ £ flaihM^nfaXnMXnM < ClKAa £ £ T 

k 1 =lk 2 =k 1 +l ki=lk 2 =k x +l D ki,2a°k 2 Aa k\k 2 

< c 4 n- 4a £ k 2a - 3 £ k 2a - 2 < c 4 n- 4a £ kf- 4 < c 4 n 



-3 



k 2 =2 



fc,=l 



k 2 =2 
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□ 

Proof of Theorem 2.1 (i) and (ii). Let a > 0.5. To establish the stated 
normal approximation it is enough to prove the convergence in probability of the 
first two conditional moments 



(^l n ,a 2 ) ■= (v^E [Y n \&„] , nVar \Y n \%\ ) 4 (0 



2a + 1 
2a -1 



n — > °° 



since then, due to the conditional normality of Y n , we will get the following con- 
vergence of characteristic functions 



— > e 2 < 2, 



2a+l v 2 
2(2a-\) I 



Now, due to Lemma 4.4 we can write 



ne 



-2aU„ 



(Ll n ,oZ) = (Vn8e- aU ", l + (n-l)E 

Using relations from Section 4 we see that 

r 2i , , , 2-(n + l)(2a+l)b naa 2a + l 
E[a n ]=l -nb n , 2a + — — 

It remains to observe that on one hand, according to Lemma 5.3 



)• 



l + (w-l)E 



p 2a+\ 
~* 2a -1 



and on the other hand, ne 2aU " A 0, implying that o 2 A 2a-\ ■ ^ n ^ s to g emer with 

fl n — > 0 holding in L 2 and therefore in probability, entails (fi n ,a 2 ) 4 (0, §§zt)> 
finishing the proof of part (i). Part (ii) is proven similarly. 

Proof of Theorem 2.1 (in). Let 0 < a < 0.5. Turning to Lemma 5.2 ob- 
serve that the martingale H n = (n+ l)e^ a ~ l ' u "Y n has uniformly bounded second 
moments. Indeed, due to Lemma 4.4 



E[H 2 ]=(n+l) 2 E 



e 2(a-l)t/„ E 



Y n \'3 / n 



< c\nE 



,-2{\-a)U„ 



+ C2« E 



,-2{\-a)U n -2a&> 



+ c 3 n 2 E[e- 2aU "] 
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Thus, according to Lemma 3.4 we have supE [H~\ < °°. Referring to the martin- 

n 

gale L 2 -convergence theorem we conclude that H n — > Hoo almost surely and in L . 
Due to Lemma 5.1 it follows that 

n a Yn = n a b n%a -i yi«-i) H ^ V ("-D Haa =: Ya 8 a.s. and in L 2 . 
n+1 

Finally, as n — > °° 



i a E[Y n ] = 8n a b n , a ^8T{l + a) 



n 2a E 



Y 2 
1 n 



n 2a - l +n 2a (l-n- l )E e" 2 ^ + n 2a (8 2 - 1)E [e- 2aU »] 



6 Consistency of the sample variance 

Recall that E [S 2 ] = j- E [D 2 ~\ , and according to Lemma 4.3 we haveE[D 2 ] 1. 
The aim of this section is to show that Var [Z) 2 ] — >• 0 as n — > °° which is equivalent 
to 

E[Z)J]-H, n^oo. (g) 

To this end we will need the following formula, see Eq. (13) in [6] valid for 
any normally distributed vector (Zi,Z2,Z3,Z4) with means (mi, 1712,1113,1714) and 
covariances Cov [Z,Z y ] = cif. 

Cov [ZiZ2,Z3Z4] = mimjC24 + mini4C23 +m2tnT,ci4 -\-m2m4C\3 + C13C24 + C14C23. 
In the special case with m ; - = m it follows 

E [Z1Z2Z3Z4] = m 4 + m 2 (ci2 + C13 + c i4 + C23 + c 2 4 + c 34 ) + ci 2 c 34 + ci 3 c 2 4 + ci 4 c 2 3- 

(9) 

(n) 

Writing Yj instead of F- v we use the representation 
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to find out that 



E M = i(E E I 7 / 1 ] +2 EE E [ y / 2y /] 



4 — il 

;>! i ;>( ; j>ik^i,j 



^7— n £ Ee [yM + E I E WT\ +EE E E Ww] 



+ ^hw (EE E + E E E E + E E E E Wp t ] 



+11 I E E[y ;Ww ] 

i 7>i k^i,j m>k; m^ij 

Denoting by (Wi, W2, W3, W4) a random sample without replacement of four trait 
values out of n available, so that 



E[wt]=n- 1 l t E[Yf], 

i 

E [W?W 2 ] = ^-fyEE E l r l r j] • 

e [^ 2 ] = ^T)EEb[^/]. 

E [WMW 3 ] = I II £ B [if ^ , 

E W 1 ™] = »(»-1)(h'-2)(»-3) EE E IE [SW „] 

v /v ;v ' 1 J^i kytij m^i,j,k 

we derive 



E tojl = n" 1 E [W^l — 4n _1 E \W?W 2 } + - . 2 "^ 3 E [wfwfl 

n(n — 1) 

2(n-2)(n-3) r , , (n-2)(n-3) r 

We compute the five fourth-order moments in the last expression using the 
conditional normality of the random quadruple (Wi, W2, W3, W4) with conditional 
moments given by 
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E[Wi\& n } = 8e- aU », 
E[W?\%] =l + (8 2 -l)e- 2aU ", 
Var[W i \& n } = l-e- 2aU >>, 



Cov [W^Wjl&n] =E 



-20!T;; ' I 

e 'J 



>- 2aU \ ije {1,2,3,4}, i^j, 



where x\" ,m> is the time to the most recent ancestor for the pair of tips (z, j) among 



m randomly chosen tips of the Yule n-tree. Clearly, all t/ ; "' 4 ' ) have the same distri 
bution as T^ n \ and for 



(n) „ 

v^:=E 



-2a^ 4) , 



we can find the asymptotics of 



(n) 


= E 


" -2arW 


, E 


v (n) -2aU n 


= E 


ij 








v ij e 





-IcOJn-lcnW 



using Lemma 3.4. Notice also that 



(v£°) 2 ' 



n — y °°. 



This follows from Lemma 5.3 and Lemma 3.4 as 



'e- 2 ^ (n) W n 



Var 



+ E 



-2«t(") 



(i) With Zi=Z 2 =Z 3 =Z 4 = W\ in Eq. (9), we obtain 



1 



5 4 e- 4aU "+65 2 e- 2aU "(\ - e - 2aU ") + 3(1 - e - 2aU "f 



and therefore E \W 4 ~\ — > 3 as n — > °°. 

(ii) Using Eq. (9) with Z\ = Z 2 = Z3 = W\ and Z4 = W2 we obtain 

E [W?W 2 \& n ] = 8 4 e- 4aU "+38 2 e- 2aU "(l-e- 2aU ") 

+ 35 2 e- 2aU "(v[f -e- 2aU ")+3(l-e- 2aU ")(v^ -e- 2aU 
= 3vg - 3(8 2 - 1) (1 - v[f)e- 2au " + (8 4 - 68 2 + 3) e - 4aU » 
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resulting in E [Wj [W2] — > 0 as n — > 00. 

(iii) Eq. (9) with Z\ = Z2 = Wi and Z3 = Z4 = W2 gives 

E [flfw 2 ^] = 8 4 e- 4aU " + 28 2 e- 2aU "(l-e- 2aU ") 



12 



_l_4g2 g -2a£/ B £ v (») _ e -2aU„^ + ^ _ g -2a[/„^2 + 2(V" ! - -^ 2af/ '' 
: l + 2(5 2 -l)e- 2a(/ » 



+ (5 4 -65 2 + 5) e - 4a ^ + 4(5 2 -l)v 1 ( ^- 2af/ » + 2(vg) 2 , 
so that E [Wf W 2 2 ] -> 1 as w 00. 
(iv) Using a consequence of Eq. (9), 



E [Z 2 Z 2 Z 3 ] = m 4 + m 2 (cn +2ci2 + 2ci 3 +C23) + C11C23 +2ci 2 ci3, 



we get 



E [wfW 2 W 3 |^i] = 8 4 e- 4aU " + 8 2 e- 2aU "(\ - e - 2aU ") 



+ 5 2 e -2aU n{2v (n) +2v (n) + 



,(») 



+ (!-« 



-2aU„ 



12 

)(V; 



(«) 
23 



'13 -rV£'-5<j- 



-2aU n \ 



-2aU„^ 



+ 2(v 



(») 



'12 

: (S 2 - \)e- 2aU " + (8 4 - 68 2 + 3)e- 4aU " + 2v# v 13 
+ 2(5 2 - l) e - 2at/ »(vg + vg } ) + (1 + (S 2 - lK^Jvg. 



-2aU n \ 



Using the Cauchy-Schwarz inequality 



0<E 



V 12 v 13 



< E 



,(«)' 
12 



we obtain E [W^WWa] — >• 0 as n — > 00. 
(v) According to Eq. (9) we have 
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E[WiW 2 W 3 W 4 |%] = 8 4 e- 4aU " 

+ 8 2 e- 2au " (vg + vff + vj? + + v& } + v£ } - 6e- 2at/ ») 

+ ( v W_ e -2a f /„ )(v W_ e -2a(/„ ) 
+ (v W_ e -2af/ n)(v W_ e -2at/„ ) 

+ (vff- e - 2o:t/ »)(vg ) - e - 2o:(/ »), 

implying 



E[WiW 2 W 3 W 4 |^] = (5 4 -65 2 + 3)e- 4o:t/ " + v 1 ( 2 ) v5 ) + v 



W (») , ■ 

12 V 34 + Vl " V 



.(«) 
12 



(») 
34 



, , i/M , (w) , (w) . (n) , (n)\ 



(») (») 

V 12 V 34 



V 12 V 34 



Using an estimate for E 
we used in (iv), we find E [W1W2W3W4] — > 0 as n — > °°. 



M2 v 34 



similar to that 



Finally, putting the above results (i) - (v) into Eq. (10) we arrive at Eq. (8). 
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A All moments of U n and v n > 

Eq. (3) for the Laplace transforms of the random variable U n can be used to 
calculate the moments of U n using, 

E[f/„ m ] = (-l) m (d m E [e- xU »] /dxT)\ x=Q . (11) 
For a fixed n we introduce the following notation, 

A(x) 1 1 



x + 1 x+n 

M*) = 71 , ^ + • • • + 



m 



(x+l) ra (x+n 
b m (x) = (fci(x),...,fc m (x)). 

Notice that A (0) = 1/n! and b m (0) = H n%m is the n-th generalized harmonic num- 
ber of order m, 

n y 

H n ,m = Y*pi- (12) 
(=1 ' 

We can write Eq. (3) as E [e _x(/ «] = n\A(x). Its first derivative with respect 
to x is — n\A(x)b\{x), and the second derivative is n\A(x)(b\(x) 2 -\-biix)). For the 
general recursive formula we introduce the following notation. We will denote 
by k = (&i,&2> • • •) infinite dimensional vectors with integer- valued components, 
and write k G srf m if all k{ > 0 and |k| := Y4L1 hi = m - Therefore &/ m represents 
the set of all possible ways to represent m as a sum of positive integers. We will 
also use the multi-index notation b m (x) k = b\ (x) kl ■ b m (x) . 

Since A'(x) = —A{x)b\{x), and b' m (x) = —mb m+ \[x), we can show by induc- 
tion that, 



^—E [e- xU »] = (-l) m n\A(x) £ c k b m « k , (13) 

where coefficients Ck are defined for all vectors k = (^1,^2,.- •) with integer- 
valued components using the recursion, 

in 

Ck= £(j^ + l)c kJ , (14) 

j=0 
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with m = |k| and 

^,0 = ^-1^3,...), 

Ckj = C( felr .. ) ^ + i ) ^. +1 _i r ..), j> 1. 

The boundary conditions for the recursion of Eq. (14) consist of two parts: 

• Ck = 0, if all ki = 0, or one of the coordinates of the vector k is negative, 

• Ck = 1 if k\ > 1 and all other ki = 0. 
We conclude from Eq. (13) that, 



The technique for calculating the m-th derivative of the Laplace transform of 
fW given by Eq. (5) is the same but requires new notation 



My) 



1 



1 



1 



y-1 y + 2 y + n 
% 1 1 



(y+n) 



Notice that A' (y) = -A(y)bi(y), b' m (y) = -mb m+1 (y),A(0) = -n\ and b m (0) = H n>l 
if m is even or b m (0) = H n ^ m — 2 if m is odd. One can then inductively show that, 



dy n 



(-l) m 2m! 



(n-l)Cy-l)'"- 1 
+ £ c k b ra (v) k ), 



(-l) m+1 (n + l)! 



A(y)(Bi(y)' 



with the coefficients Ck defined as previously by Eq. (14). Therefore, we get, 



2m! 
n-l 



ke,e4 



(' odd 



;=1 
i even 
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Similarly we can use Eq. (6) to calculate the joint moments for U n — and 
T" in terms of, 



x + i + 1 jc+ j 

1 1 



W , (x) = 7 — 1 - i 1N - + - + 



(> + * + l) m ' (* + ./)' 
For m > 1 and r > 1 we first get, 



dx m dy' 



E 



e -^([/„-TW)-yTW 



n-1 



X 



and then from the above, 



( [ / n _ T («))m T (») r = (_!) 



2(n+i; 
n-1 



n— 1 j ' m 



X 



i ()+1 ; (/+2) 1 1 *n*j i c >n(».r%)* 
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